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ABSTRACT 


This  report  describes  the  continuing  development  ol  scanning,  pre- 
processing,  character-cla sslf i ca t ion,  and  context-analysis  techniques  tor 
hand-printed  text,  such  as  computer  coding  sheets  in  the  t-ORTRAN  language. 

ttoth  edge-detection  and  topological  preprocessing  are  coupled  »ltlt 
classil'icat  ion  by  a  learning  machine  and  used  lo  process  a  large  lile  ol 
characters  printed  by  a  single  author.  The  two  systems  are  combined  to 
achieve  a  recognition  rate  considerably  better  than  our  previous  results. 
No  other  comparable  results  on  unconstrained  hand  printing  with  a  lull 
alpha  bet  are  known  to  us. 

The  same  methods  are  also  applied  to  a  well-known  tile  ol  hand¬ 
printed  characters  collected  by  llighleyman.  The  combination  ol  prepro¬ 
cessing  and  classification  methods  has  achieved  performance  better 
than  that  reported  for  any  other  recognition  system. 
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1  I  NTKOUrCI  ION 


Tills  report  describes  the  contl mil hr  development  of  scanning,  pre¬ 
processing.  ch  i  racier-classi f teat  Ion,  and  context-analysis  techniques 
Tor  hand-printed  text.  The  particular  subject  matter  of  our  investi¬ 
gation  is  hand-printed  FORTRAN  text  on  standard  computer  coding  sheets, 
with  a  '16-chnrncter  alphabet.  The  reader  is  referred  to  the  previous 
reports  of  this  project  for  background  and  supplementary  material. 

In  Sec  II,  we  describe  a  single  author's  file  of  2,999  hand¬ 
printed  characters,  used  to  continue  the  intra-author  recognition 
experiments  beyond  the  preliminary  experiments  described  in  the  last 
Quarterly  Report.  The  TOPO  3-CALM  and  PREP-CALM  preprocessor-classifier 
systems  were  applied  to  this  file,  and  performance  was  observed  far 
exceeding  any  previously  seen  In  multi-author  experiments. 

In  Sec.  Ill,  wc  she*  the  results  of  combining  the  action  of  the 
systems  treated  in  Sec.  It.  The  combined  system  recognized  independent 
test  data  with  97-percent  accuracy  and  no  rejects.  This  is  our  best 
recognition  score  to  date,  and  we  know  of  no  comparable  results  reported 
for  the  recognition  of  unconstrained  hand  printing  with  a  full  alphabet. 

A  collection  of  experiments  on  a  well-known  set  of  hand-printed 
data  collected  by  llighleyman  is  described  in  Sec.  IV.  The  PREP-CALM 
system  performed  consider  bly  better  than  any  of  several  previously 
reported  methods,  none  of  which  involved  extensive  preprocessing  of 
i  lie  da  t  a  . 


I 


II  I NTRA- AUTHOR  EXPERIMENTS  ON  THE  JM  DATA  FILE 


A.  Introduct ion 

In  Soc.  Ill  of  the  preceding  Quarterly  Report,  we  described  several 
limited  experiments  on  hand  printing  from  a  single  author.  These  experi¬ 
ments  indicated  a  great  reduction  in  error  rate,  compared  to  the  rates 
obtained  in  multiple-author  experiments  to  date.  We  concluded  that  "The 
results  of  these  experiments  should  be  considered  somewhat  tentative  ... 
the  test  samples  wore  statistically  small  ...  the  data  were  taken  front 
coding  sheets  in  which  20  alphabets  were  written  on  successive  linos  at 
one  sitting." 

W’e  have  now  performed  the  follow-on  experiments  pointed  to  in  the 
preceding  report,  using  a  large  file  of  data  including  training  and  test 
data  from  actual  coding  shoots.  These  experiments  have  borne  out  the 
dramatic  improvement  in  recognit ion- test  error  rate  suggested  by  the 
earlier  experiments. 

B.  The  .JM  Data  File 

The  JM  data  file  consisted  of  2,999  characters  in  the  '16-cat.egory 
FORTRAN  alphabet,  hand-pr inted  by  John  Munson.  This  author  was  chosen 
as  the  source  of  the  file  because  of  the  existence  of  a  number  of  actual 
coding  sheets  prepared  by  him  on  the  proper  forms  during  the  development 
of  SDS  910  FORTRAN  computer  programs. 

The  first  920  character  patterns  in  the  file  were  the  20  alphabets 
(Sequence  Nos.  50-69)  used  for  the  previously  reported  intra-author 
experiment s .  Added  to  these  were  2,079  characters  gathered  from  lour 
separate  coding  sheets,  written  at  different  times  over  a  period  of  a 
few  months.  Each  line  on  a  coding  sheet  was  given  a  unique  sequence 
number,  ranging  from  1,000  to  1,111. 

The  first  five  alphabets  in  the  file  (Sequence  Nos.  5()-f>d,  p;n  rns 
1-20'))  were  reserved  for  possible  testing  but  were  not  used.  The  training 
set  contained  1.727  patterns.  It  consisted  of  1  he  remaining  15  alphabets 
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(Sequence  Nos.  55-69,  patterns  231-920)  and  1,037  characters  of  text 
(Sequence  Nos.  1,000-1,056,  patterns  921-1,957).  The  test  set  contained 
1,042  characters  taken  from  two  coding  sheets  (Sequence  Nos.  1,057-1,111, 
patterns  1,958-2,999).  About  one-third  of  the  test  data  came  from  the 
same  sheet  as  some  of  the  training  data;  the  remainder  came  from  a 
separate  sheet  that  was  written  separately  from  any  of  the  training 
data . 

The  inclusion  of  the  hand-printed  alphabets  in  the  training  data 
ensured  that  each  of  the  46  character  types  would  be  represented.  The 
character  types  were  not  evenly  represented  in  the  text  material.  Their 
appearance  was  determined  fortuitously  by  the  text  that  happened  to  be 
chosen. 

The  same  training  and  test  sets  were  employed  throughout  the  several 
experiments  to  be  described. 

C.  Legibility  of  the  JM  Text 

A  fragment  of  the  actual  test  data  is  shown  in  Fig.  1.  It  may  be 
seen  that  the  printing  is  fairly  legible;  it  is  by  no  means  highly  regular. 
The  printing  was  done  with  a  little  care,  but  with  no  labored  attention  to 
the  quality  of  individual  characters.  The  coder  was  actually  preparing  a 
program  text  for  keypunching,  although  aware  that  the  sheet  might  some  day 
be  used  in  recognition  experiments.  Thus,  although  the  test  data  were  not 
completely  ''candid”  data,  they  were  generated  under  conditions  that  closely 
model  a  system  in  which  workers  were  preparing  material  for  machine  input. 

Ten  human  subjects  were  asked  to  classify  the  test  set  characters, 
which  were  presented  (in  random  order)  in  quantized  form  on  the  cathode 
ray  tube  attached  to  the  SDS  910  computer.  The  average  error  rate  was 
0.72  percent;  assuming  a  normal  distribution  of  scores,  the  "true"  error 
rate  was  0.72  t  0.17  percent  with  95-percent  confidence.  (If  the  10 
responses  for  each  character  were  used  to  reach  a  group  decision,  only  2 
errors  [0.2  percent]  were  made.  This  would  indicate  that  the  individual 
errors  were  largely  uncorrelated. )  These  rates  do  not  include  the  few 
typographical  errors  made  by  the  subjects  in  typing  their  responses.  The 
rates  also  do  not  include  six  patterns  found  to  be  mislabeled  in  the  test 
data  file. 
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FIG.  1  _  A  FRAGMENT  OF  HAND-PRINTED  TEXT  FROM  A  SINGLE  AUTHOR 


I).  TOPO  3-CALM  Experiment  1 

The  complete  .JM  file  was  preprocossod  by  the  SDS  910  computer  program 
TOPO  3.  TOPO  3  was  a  minor  revision  of  the  TOPO  2  program  that  has  been 
described  in  earlier  reports.  TOPO  3  was  rearranged  to  make  it  run  con¬ 
siderably  faster  than  TOPO  2,  and  the  set  of  features  in  the  output  feature 
vector  was  slightly  different.  For  all  practical  purposes,  however,  the 
topological  features  produced  by  TOPO  3  (describing  character  enclosures, 
concavities,  stroke  tips,  profiles,  size,  and  so  on)  were  the  same  as  those 
from  TOPO  2. 

The  output  feature  vectors  from  TOPO  3  were  processed  by  the  CALM 
learning-machine  simulation,  which  implemented  a  ^6-category  linear  machine 
The  training  and  test  sets  defined  above  were  used. 

The  results  of  the  experiment  arc  shown  in  Fig.  2.  For  the  first  5 
iterations,  training  was  performed  only  on  the  690  characters  from  the  15 
alphabets  within  the  training  set,  Thereafter,  the  full  training  set  was 


used.  Test  readings  were  not  taken  until  after  the  fifth  training  iteration. 
At  the  time  of  the  first  test  reading  (Iteration  5),  the  machine  had  only- 
been  trained  on  characters  from  the  alphabets.  The  test  error  rate  dropped 
from  13  percent  to  10  percent  between  the  fifth  and  sixth  iterations,  owing 
to  the  expansion  of  the  training  set  to  include  the  text  characters. 

The  training  error  rate  reached  6  percent  in  10  iterations,  and  the 
test  error  rate  reached  9  percent.  The  test  error  rate  was  approximately 
the  same  as  that  of  TOPO  2-CALM  Experiment  A,  described  in  the  preceding 
Quarterly  Report,  in  which  much  smaller  training  and  test  sots  were  used. 

The  larger  amount  of  training  data  compensated  for  the  increase  in  diffi¬ 
culty  of  recognizing  characters  from  text  on  actual  coding  sheets,  compared 
with  the  characters  in  alphabets. 

E.  PREP- CA LAI  Experiment  11 

The  JM  file  was  preprocessod  by  the  computer  simulation  of  the  edge¬ 
detecting  preprocessor,  PREP  24A.  In  this  run,  the  patterns  were  only  pre- 
processed  in  one  view.  The  resulting  feature  vectors  were  presented  to 
CALM  for -processing  in  PREP-CALM  Experiment  11. 

The  results  are  shown  in  Fig.  3.  As  in  Experiment  TOPO  3-CALM  1, 
only  the  690  alphabet  patterns  were  used  for  training  in  the  first  5 
iterations,  and  the  full  training  and  test  sets  were  used  thereafter. 

The  training  error  rate  reached  1  percent;  the  test  error  rate  reached 
12  percent. 

F.  PREP-CALM  Experiment  12 

In  PREP-CALM  Experiment  12,  PREP  24A  was  used  to  preprocess  each 

hand-printed  pattern  in  nine  different  views.  The  advantage  of  nine-view 

© 

over  one-view  preprocessing  with  the  edge-detecting  masks  has  been  shown 
in  other  experiments  previously  reported  during  this  project. 

In  running  CALM  on  the  nine-view  preproccssed  feature  vectors,  nine 
training  iterations  were  first  performed  over  the  entire  training  set. 

During  this  sequence  of  iterations,  each  view  of  each  training  pattern 
was  presented  once  for  training.  The  test  patterns  were  then  presented 
for  nine-view  testing.  In  this  case,  the  classification  was  done  "by 
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views.'  As  each  view  was  presented,  the  learning  machine  was  forced  to 
make  a  category  decision.  A  vote  was  taken  among  the  nine  single-view 
decisions  to  produce  the  final  decision. 

The  sequence  of  nine  training  iterations  followed  by  a  nine-view  test 
iteration  was  repeated  three  times.  The  following  results  were  obtained: 

Iteration  9  Iteration  18  Iteration  27 
Training  error  rate  15%  11%>  10%> 

Nine-view  test  error  rate  6%  6%  5% 

C. .  PREP-CALM  Experiment  12A 

Nine-view  classification  "by  categories"  is  an  alternative  to  classi¬ 
fication  "by  views."  In  classification  by  categories,  an  accumulator  regi 
ster  is  employed  for  each  category.  The  registers  are  initially  zeroed 
and,  as  each  view  is  presented,  the  Dot  Product  Unit  sums  are  added  into 
the  registers  for  the  corresponding  categories.  After  all  views  have  been 
presented,  the  character  is  assigned  to  the  category  with  the  largest 
accumulated  total.  We  have  noted  previously  that  the  two  methods  of  multi 
view  classification  yield  comparable  results  (Report  No.  22,  the  final 
report  for  Contract  DA  36-039  AMC-03247(E) ,  page  46). 

In  PREP-CALM  Experiment  12A,  the  weights  of  the  trained  learning 
machine  from  Experiment  12  (at  the  27th  iteration)  were  roused.  The  CALM 
program  was  slightly  modified  to  classify  the  test  patterns  by  categories, 
instead  of  by  views.  The  resultant  test  error  rate  was  4  percent,  versus 
5  percent  for  the  former  experiment. 

H.  PREP-CALM  Experiment  13 

PREP-CALM  Experiment  13  was  motivated  by  the  following  observation 
concerning  nine-view  testing  by  categories:  the  result  obtained  by 
presenting  nine  different  feature  vectors  (views)  and  accumulating  the 
DPI'  sums  can  also  be  obtained  by  adding  together  the  nine  feature  vectors, 
component  by  component,  and  presenting  the  result  as  a  single  feature 
vector.  In  other  words,  it  makes  no  difference  whether  the  data  repre¬ 
senting  t  lie  nine  views  are  added  together  at  the  feature-vector  level  or 
at  the  DPI'  sum  level.  This  is  a  consequence  of  the  linear  nature  of  the 
DPI'  read  operation. 
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The  question  arises:  What  would  be  the  effect  of  applying  this  change 
in  policy  to  the  training  patterns  as  well  as  the  test  patterns?  tn  order 
to  answer  this  quest  ton,  w  accumulated  the  nine  feature  vectors  for  each 
pattern  lrt  the  .im  tile  into  a  single  feature  vector,  (because  the  original 
feature  vectors  has  binary  components  of  ‘1  and  -1,  the  new  vector  had 
components  ranging  trow  ‘*9  through  :S.) 

The  accumulated  feuturo  vectors,  arranged  into  the  usual  training  and 
test  sets,  wore  used  as  input  to  CAl>i.  In  10  iterations,  the  training 
error  rate  reached  O.H  percent.  The  test  error  rate  was  7  percent  after 
5  iterations  and  9  percent  after  10  iterations. 

In  view  of  the  equivalence  we  have  .iust  described,  the  test  patterns 
for  (experiment  12A  and  tor  Lxperiment  13  are  effectively  identical.  The 
poorer  performance  in  the  latter  experiment  must  be  a  result  of  the  differ¬ 
ent  training  histories.  We  hypothesize  that  the  separate  presentation  oT 
each  view  forces  the  learning  machine,  to  "train  harder,"  intuitively 
speaking-- that  more  mileage  Is  obtained  from  the  data  because  each  view 
represents  a  separate  pattern  to  challenge  the  machine. 

Thus  we  liave  observed  that  the  best  performance  is  obtained  by 
grouping  all  the  views  of  the  test  pattern  together  anti  testing  by 
categories,  while  using  the  views  of  the  training  patterns  soporatly. 
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A.  introduction 

The  technique-  of  combining  the  TOPO-CALM  preprocessor-classifier 
system  with  the  PREP-CALM  system,  in  order  to  reduce  the  classification 
error  rate,  was  anticipated  in  the  Second  Quarterly  Iteport  of  tins  protect 
(pages  9-10).  The  technique  was  first  tested  on  the  limited  sample  of 
single-author  data  used  for  our  first  intra-author  experiments,  and  it 
gave  n  definite  improvement  in  performance  (Sixth  quarterly  deport,  pages 
15-16).  U‘e  have  now  applied  the  technique  to  a  more  adequate  set  of  test 
data,  namely,  the  test  data  from  the  .JM  flic  described  in  the  preceding 
section  of  this  report,  The  TOPO-CALM  system  was  combined  with  both  the 
one-view  and  nine-view  versions  of  the  PREP-CALM  system. 

It.  TOPO  3-CALM  Experiment  1  and  PHKP-CALM  Experiment  11  Combined 

The  first  combined  experiment  was  performed  by  adding  together  the 
learning-machine  responses  for  the  test  patterns  from  TOPO  3- CALM  Experi¬ 
ment  1  and  those  from  PIlKP-CALM  Experiment  11.  Kor  each  test  pattern,  the 
two  Dot-Product-l'nit  sums  in  each  of  the  16  categories  were  added  to  form 
a  new  set  ol  >16  sums  on  which  the  classification  decision  was  to  be  based. 
Prior  to  the  addition,  the  sets  of  sums  from  the  two  experiments  were 
sealed  by  an  empirically  determined  scale  factor  so  thnt  they  would  have 
approximately  the  same  overall  range  of  »h1ucs  and  nei-t-her  set  wou Id 
overwhelm  the  other  in  the  addition. 

The  test  error  rate  using  the  combined  sums  was  A  percent.  This 
value  may  be  compared  with  those  from  the  two  experiments  using  the 
individual  machine  combinations,  namely,  9  percent  (TOI’O  3-CALM  Experi¬ 
ment  1)  and  12  percent  (PREP- CALM  Experiment  11) 

Combining  the  two  preprocessor-class  1 1 ier  systems  in  parallel  is 
evidently  a  powerful  method  for  improving  performance.  The  improvement 
implies  that  the  particular  errors  mad,,  by  one  a)t>icn  are  to  a  considerable- 
degree  independent  of  the  errors  made  by  the  ot her--ot hcrwlsc,  the  combined 
system  would  behave  much  like  either  of  the  individual  ones. 
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C.  TUI’O  3- CALM  Experiment  1  and  PREP- CALM  Experiment  12 A  Combined 

In  order  to  combine  the  learning-* machine  responses  to  the  test  data 
oi  TO  1*0  3- CALM  Experiment  1  with  those  of  PREP- CALM  Experiment  12A,  it  was 
nccessars  to  condense  t  tie  nine-view  responses  at  the  latter  to  a  single 
response,  This  was  done  by  using  the  accumulated  Dot-Product-t’nit  sums 
(formed  during  the  classii icat ion-bv-ca tegorics  process)  to  represent  t ho 
response  ot  the  nine-view  PRKP-CAUW  system  to  the  pattern  ns  a  whole. 
Obtaining  the  accumulated  sums  for  this  purpose  was,  in  fact,  the  prime 
motivatton  for  performing  Experiment  12A. 

The  accumulated  sums  from  the  PRt'l’-GlLM  system  were  scaled  and  added 
to  the  sums  from  the  TOPO-CALM  system.  Just  as  in  the  other  combined 
experiment  described  above-  Using  the  combined  sums  as  the  basis  of 
classification,  we  observed  an  error  rate  of  3  percent.  Thi3  compares 
with  test  error  rates  of  V  percent  for  the  TOPO  3-CALM  system  alone  and 
■t  percent  for  the  nine-view  PREP-CALM  system  alone. 

Uy  examining  the  distribution  of  the  difference  between  the  largest 
und  the  second  largest  combined  sums,  we  obtained  a  tradeoff  curve  of 
errors  vs  rejects  lor  the  combined  system.  This  curve  is  presented  in 
Figure  1.  If  the  reject  margin  ol  the  combined  machine  were  set,  for 
example,  to  reject  3  percent  of  the  test  patterns,  the  error  rate  would 
bo  reduced  to  1-1.  2  .  pLvrcent ■  beyond  this  point  .  the  rate  of  return  (In 
terms  of  error  reduction)  diminishes. 


*A  WM  )* 


FIG.  4  TRADEOFF  CURVE  FOR  COMBINED  SYSTEMS  ON  SINGLE-AUTHOR  DATA 
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D.  Summary 


The  performance  just  described  is  by  far  the  best  performance  that 
we  have  achieved  to  date  on  a  significantly  large  body  of  hand-printed 
data.  To  our  knowledge,  no  reported  experiments  or  operational  systems 
have  achieved  comparable  performance  on  relatively  unconstrained  hand 
printing  with  a  full  alphabet.  let  us  summarize  briefly  the  factors 
pertinent  to  this  result. 

We  have  attempted  elsewhere  in  this  report  to  indicate  the  quality 
of  the  hand-printed  test  data.  We  would  suggest  that  the  quality  is  com¬ 
parable  to  that  expected  of  data  prepared  by  workers  for  machine  input  if 
the  workers  were  reasonably  motivated  but  had  no  particular  training  in 
forming  characters  and  observed  no  detailed  constraints.  The  addition  of 
such  training  and  constraints  should  reduce  the  variability  of  the  printing 
to  a  level  so  low  that  the  same  recognition  system  would  experience  an  error 
rate  of  much  less  than  3  percent.  This  approach  may  be  necessary  for  systems 
in  which  text  recognition  with  good  accuracy  is  to  be  performed  without  the 
aid  ol  sophisticated  context  analysis. 

Looking  the  other  way.  the  single-author  result  is  far  better  than  t  lie 
multi-author  results,  which  give  an  indication  of  the  system's  performance 
with  the  unconstrained  printing  of  an  untutored  population.  Considerable 
education  and  constraint  would  evidently  have  to  be  applied  to  a  popula¬ 
tion  in  order  to  achieve  high  recognition  rates. 

The  recognition  system  has  arrived  at  its  present  level  of  performance 
through  the  successive  incorporation  of  several  new  features,  whose  progress 
lias  been  detailed  in  many  of  the  previous  Quarterly  Reports.  Starting  with 
the  original  PUKP  and  CALM  structures,  which  were  implemented  both  in  hard¬ 
ware  and  in  computer  simulations,  major  additions  have  been  the  nine-view 
preprocessing,  the  TOTO  preprocessors,  and  the  parallel  combining  of 
preprocessor-classifier  systems.  Kacli  of  these'  building  blocks  plays  an 
important  role  in  the  final  result. 
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IV  KXfK.lt  1MKNT.S  KITH  HUillLEYMAN' S  DATA 


A.  Int  roduct ion 

One  of  the  recurring  problem*  in  evaluating  pattern-recognition 
results  reported  in  the  literature  is  that  few  authors  give  sufficiently 
detailed  descriptions  of  the  data  they  use.  This  makes  it  very  difficult 
to  make  fair  comparisons  of  different  pattern-recognition  procedures.  One 
set  of  data,  however,  has  been  used  as  a  standard  of  comparison  by  sevcrul 
researchers:  the  set  of  hand-printed  characters  collected,  quantized,  and 

encoded  by  lllghlcymuit.  Since  these  data  were  readily  convertible  to 

our  stundard  24  ■  24  format,  we  decided  to  apply  our  techniques  to  them. 

Ilighlcyman' s  data  set  consists  of  50  alphabets  oi  hand-printed 
characters.  Web  alphabet  was  printed  by  a  different  individual,  and 
each  contains  36  characters  (the  10  numerals  and  26  upper-case  letters) 
quantized  and  represented  as  12  ■  12  binary  (black-white)  array.  The 
grout  amount  of  variability  encountered  in  the  data  has  tended  to  rule 
out  the  simpler  approaches,  such  us  the  use  of  decision  trees,  and  the 
methods  used  Iwve  been  more  or  less  statistical  in  spirit. 

One  common  characteristic  of  these  methods  has  been  the  use  ol  some 
or  all  of  the  patterns  to  iix  the  values  of  free  parameters  In  the  classi¬ 
fier.  in  those  cases  where  the  first  40  alphabets  (the  training  data) 
were  used  to  determine  parameters  and  the  last  In  alphabets  (the  testing 
data)  were  used  to  provide  an  independent  test,  the  performance  on  the 
test  data  was  always  much  worse  titan  the  performance  on  the  training  data, 
for  example,  Chow  obtained  a  2.1-percent  error  rute  on  the  training  data, 
but  u  41.7-percent  error  rate  on  independent  test,  and  this  represents 
the  best  perl ormance  reported  to  date. 

Similar  discrepancies  Iwve  been  noted  by  other  investigators 
and  have  usually  been  attributed  to  the  small  number  of  samples  available 
tor  characters  having  so  much  variability.  There  is  no  doubt  t ha t  a 

lie  I  cremes  are  listed  at  the  end  of  tins  report. 
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larger  number  of  samples  would  reduce  the  size  of  this  discrepancy,  for 
in  the  case  of  infinate  training  and  testing  sets,  the  error  rates  should 
be  the  same.  It  is  not  clear,  however,  how  much  the  tost  error  rate  would 
be  reduced,  or  how  many  samples  would  be  needed  to  estimate  the  best 
achievable  performance. 

Ir.  this  section  we  shall  describe  the  results  of  three  different 
experiments  with  fighleyman' s  data.  The  first  used  a  nonparametric 
classification  procedure  that  exchanged  the  need  for  assumptions  about 
the  pattern  distributions  for  the  need  for  a  large  number  of  patterns. 

The  second  used  edge-detecting  preprocessing  prior  to  classification  to 
remove  some  of  the  variability  in  the  characters  and  to  exploit  simple 
a  priori  knowledge  about  the  data.  In  the  third  experiment,  the  ability 
of  people  to  recognize  the  test  data  was  measured  to  provide  an  objective 
performance  standard. 

B.  Nearest- Neighbor  Classification 

The-  use  of  a  nearest-neighbor  (NX)  machine  to  classify  patterns  was 
described  in  the  Sixth  Quarterly  Report.  From  a  statistical  standpoint, 
the  NN  rule  is  a  nonparametric  decision  rule  that  assigns  an  unclassified 
pattern  to  the  class  of  the  nearest  of  a  set  of  correctly  classified 
reference  patterns.  When  the  set  of  reference  patterns  is  large,  the 
error  rate  of  the  XX  rule  is  less  than  twice  the  minimum  possible  error 
rate.  Specifically,  if 

P  =  Haves  probability  of  error 

O' 

P  -  Ui rgo- sample  NN  probability  of  error 

N  -  Number  of  classes, 

then,  under  very  weak  regularity  conditions. 

P  P  2P  -  P  2  . 

o  o  N- 1  o 

and  these  bounds  can  be  shown  to  Ik-  the  tightest  possible. 

When  the  NN  rule  was  applied  to  Highleyman’ s  data,  the  training  pat¬ 
terns  were  used  as  the  reference  patterns  f  or  the  classification  nl  t  tie 
testing  data.  No  preprocessing  ot  the  data  was  performed,  each  pattern 
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being  viewed  an  a  144-component  binary  vector.  A  tent  pattern  wan  classi¬ 
fied  by  measuring  the  Hamming  distance  between  the  test  pattern  and  each 
of  the  1410  training  patterns,  and  by  assigning  the  teat  pattern  to  the 
class  ot  the  nearest  training  pattern;  ties  with  patterns  in  different 
classes  were  broken  arbitrarily. 

the  error  rate  resulting  from  applying  this  procedure  to  the  testing 
data  was  4?. 5  percent.  If  the  training  set  were  large  enough  for  the 
large  sample  results  to  hold,  this  would  mean  that  the  minimum  error  rate 
would  lie  somewhere  between  27,6  percent  and  47.5  percent.  We  shall  see 
tltat  the  minimum  error  rate  is  probably  less  than  11.4  percent,  and,  hence, 
that  the  training  dntu  is  not  a  sufficiently  large  sample  in  the  nearest- 
neighbor  sense. 

C.  Kdgo-Dcteet lng  Preprocessing  and  Piecewise- Li near  Classification 

One  of  the  big  differences  between  Highlcyman's  data  and  the  dutu 
we  liuve  been  using  in  our  experiments  is  that  broken  and  fragmented 
characters  appear  frequently  in  Highlcyman's  data.  This  ruled  out  the 
use  of  the  TOPO  programs  to  extract  features.  However,  all  that  was 
needed  to  use  the  PIU;P  24A  simulation  ot  the  1024-Image  optical  pre¬ 
processor  was  to  expand  the  12  •  12  figures  to  24  v  24  figures.  This 
was  done  merely  by  copying  each  row  and  column  twice. 

A  PltKP-CALM  expo i- i men l  was  run  using  the  expanded  patterns  Just  as 
we  used  our  own  data  in  the  experiments  desert  tied  in  the  Second  and  Third 
Quarterly  Hcports.  The  84-bit  feature  vectors  were  obtained  for  9  views 
ot  each  character.  These  iormed  the  input  for  the  CALM  simulation  ot  a 
26-category  Pieeewise-Llnear  learning  Machine  having  two  Dot  Product 
lulls  per  category  and  a  training  margin  of  85. 

Alter  18  iterations  of  the  training  data  (by  which  time  all  views 
oi  all  of  t  tie  training  patterns  had  been  encountered  twice),  testing  was 
per Iormed.  All  nine  views  ol  each  test  pattern  were  presented,  and  the 
class  appearing  most  often  among  the  tunc  individual  responses  was  selected 
tor  t  he  pattern.  The  result  in  error  rate  for  all  26  classes  was  21.7 
percent.  lu  pet i i ion  of  tins  experiment  using  the  10  numerals  alone 
yielded  an  error  rate  of  12.0  percent.  Hot h  ol  these  results  are 
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•  lgnl f leant ly  better  than  previously  reported  results,  but  this  performance 
still  falls  short  of  human  performance. 

1).  Human  Performance 

In  I960;  N'eiasor  and  Keene  reported  an  average  error  rate  of  4 .  I 

percent  made  b>  a  group  of  nine  people  in  recognizing  hand- printed  upper- 

case  letter#  and  numerals,  and  thev  Indicated  that  3.2  percent  wus  probably 

l  0 

a  good  estimate  Of  the  minimum  possible  error  rate  for  their  daia. 

These  results  apply  to  a  34-category  alphabet,  since  confusions  between 
J[  and  1  or  between  0  and  0  were  not  counted  as  errors.  Most  importantly, 
the  characters  used  were  reproduced  photographica lly  with  hign  resolution 
and  apparently  with  good  gray  scale,  whereas  llighleyman' s  data  are  low- 
resolution  two-level  gray-scale  figures;  thus,  these  rates  do  not  apply 
to  llighleyman 1  s  data. 

To  estimate  human  error  rates  on  lilglileyman’ s  data,  wo  performed  a 
simple,  computer-controlled  experiment  involving  It)  people  who,  though 
aware  of  the  existence  of  llighleyman' s  data,  had  not  seen  the  test  data 
before.  The  experimental  procedure  had  two  phases:  u  training  phase  in 
which  the  subjects  familiarized  themselves  with  lx>th  the  equipment  and 
the  data  by  viewing  the  training  data  under  test  conditions,  and  a  testing 
phase  in  which  performance  was  recorded.  In  both  phases,  the  characters 
^serc  selected  randomly  -wit-liout— replacement"  f  rom  !<>“alp1vabe  I  IT  print  ed  by 
10  different  writers;  the  training  phase  used  the  first  10  ulpliabets, 
while  the  testing  phase  used  the  lust  10. 

The  characters  were  displayed  as  a  12  ■  12  array  ol  points  (bright 
points  for  the  figure)  occupying  a  0. 3-inch  square  centeied  in  a  3  •  *1.  fl¬ 
inch  oscilloscope  screen.  Each  subject  was  free  to  take  as  long  as  he 
wished  in  making  up  lus  mind,  and  when  a  decision  was  reached  lie  reported 
it  by  striking  the  corresponding  typewriter  key.  This  caused  t he  subject's 
decision  to  lie  recorded,  the  correct  character  to  be  typed  out  it  a  mistake 
hud  boon  made,  and  the  next  character  to  lx-  displayed,  tie  chose  to  main¬ 
tain  the  error  response  during  the  testing  phase  because  it  noticeably 
sustained  the  subject’s  attention  and  induced  him  to  periorm  well. 
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VoHt  sublccls  wrv  satisfied  with  the  I  ruining  pha^c  eftc?  they  had 
seen  75  to  100  characters,  ar.tl  volunteered  to  move  on  to  the  testing  phase. 
On  the  test  data,  their  error  rates  ranged  from  15.6  percent  to  18.5  per¬ 
cent.  with  an  average  error  rate  of  lb. 7  percent.  Assuming  n  normal  dis¬ 
tribution  of  scores,  ’his  indicates  that  with  95-percent  confidence  the 
true  mean  error  rate  is  15.7  percent  *  .9  percent. 

These  numbers  include  a  fair  proportion  of  errors  due  to  confusions 
between  and  1^  <ind  0  and  0.  If  these  errors  are  not  counted,  the  mean 
error  rate  drops  vo  11.5  percent,  which  is  still  considerably  greater  than 
the  4.1  percent  reported  by  Neisscr  and  Weeno  for  their  unquantized 
characters.  if  the  1-1  and  0-0  distinctions  are  retained,  but  ii  a  plur¬ 
ality  vote  of  the  10  separate  responses  is  used  to  classify  the  characters 
(lies  being  broken  arbitrarily),  then  an  error  rate  c*f  11.4  percent 
results.  We  believe  that  this  value  is  close  to  the  minimum  error  rate 
achievable  with  liighleyman' s  data  and  that  the  performance  of  other  methods 
on  the  36-character  to;  data  should  be  viewed  relative  to  this  standard. 

K .  Conclusions 

The  47 . 5- percent  error  rate  obtained  by  neu -"St.-neighbor  classi¬ 
fication  is  typical  of  the  error  rates  achieved  by  other  general  classl- 

C.  ,  i  -0 

fication  techniques.  If  11.4  percent  is  the  minimum  achievable  error 

rate,  then  the  47.5-percent  result  indicates  that  the  nmount  of  tfauung- 
data  is  much  too  small  for  NN  classification,  and  this  is  probably  true 
for  the  other  general  methods  as  well. 

By  employing  edge-detecting  preprocessing  followed  by  9-view  classi¬ 
fication  by  a  piecewise- linear  machii  ,  we  obtained  an  error  rate  of  31.7 
percent.  While  this  represents  a  significant  improvement  over  previously 
reported  results,  it  is  still  far  too  high  to  be  practical.  However,  the 
be*st  performance  we  can  ever  expect  ort  1‘ighleyraar '  s  data  is  approximately 
11  percent,  which  in  turn  seems  to  be  much  too  high. 

The  reason  for  most  of  those  errors  is  clear  to  anyone  whe  has  ever 
looked  at  High’evman’s  data  Aside  from  the  basic  indist 1 nguisbabili ty  of 
O' s  from  i)'s  and  many  I's  from  l's,  most  of  the  difficulty  is  due  to  cither 
inadequate  resolution  or  breaks  m  the  characters.  It  is  extremely  doubtful 
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that  more  sophisticated  preprocessing  and  classification  could  ever  overcome 
these  fundamental  difficulties*.  Thus,  while  Mighlcymsn's  data  has  served  as 
an  interesting  vehicle  for  comparing  our  classification  methods  with  others, 
its  basic  characteristics  severely  limit  its  usefulness  for  hand-printed 
character-recognition  research. 
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This  report  describes  the  continuing  development  of  scanning,  preprocessing, 
character-classification,  and  context-analysis  techniques  for  hand-printed 
text,  such  as  computer  coding  sheets  in  the  FORTRAN  language. 


Doth  edge-detection  and  topological  preprocessing  are  coupled  with  classi¬ 
fication  by  a  learning  machine  and  used  to  process  a  large  file  of  characters 
printed  by  a  sin;le  author.  The  two  systems  are  combined  to  achieve  a 
recognition  rate  considerably  better  than  our  previous  results.  "No  other 
comparable  results  on  unconstrained  hand  printed  with  a  full  alpliabet  are 
known  to  us. 

The  same  methods  are  also  applied  to  a  well-known  file  of  hand  printed 
characters  collected  by  Highleyman.  The  combination  of  processing  and 
classification  methods  lias  achieved  performance  better  than  tliat  reported 
for  any  other  recognition  system. 
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